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Abstract 

Given a matrix A £ R nxn , we present a simple, element-wise sparsification algorithm that 
zeroes out all sufficiently small elements of A and then retains some of the remaining elements 
with probabilities proportional to the square of their magnitudes. We analyze the approxima- 
tion accuracy of the proposed algorithm using a recent, elegant non-commutative Bernstein 
inequality, and compare our bounds with all existing (to the best of our knowledge) element- 
wise matrix sparsification algorithms. 

1 Introduction 

Element- wise matrix sparsification was pioneered by Achlioptas and McSherry [AMOTJ IAM07] , who 
described sampling-based algorithms to select a small number of elements from an input matrix 
A E R nx ™ in order to construct a sparse sketch A € R raxn , which is close to A in the operator 
norm. Such sketches were used in approximate eigenvector computations [AM01[ rAHK06[ IAM07] . 
semi-definite programming solvers [AHK05lldA09] . and matrix completion problems [CR091 ICTTO] . 
Motivated by their work, we present a simple matrix sparsification algorithm that achieves the best 
known upper bounds for element-wise matrix sparsification. 

Our main algorithm (Algorithm 1) zeroes out "small" elements of A and randomly samples the 
remaining elements of A with respect to a probability distribution that favors "larger" entries. In 
Algorithm 1, we let ei, ea, . . . , e n € K n denote the standard basis vectors for 1" (see Section IXT1 for 
more notation) . Our sampling procedure selects s entries from A (note that A from the description of 
Algorithm 1 is simply A, but with elements less than or equal to e/ (2n) zeroed out) in s independent, 
identically distributed (i.i.d.) trials with replacement. In each trial, elements of A are retained with 
probability proportional to their squared magnitude. Note that the same element of A could be 
selected multiple times and that A contains at most s non-zero entries. Theorem Q] is our main 
quality-of-approximation result for Algorithm 1 and achieves sparsity bounds proportional to ||A|| F . 
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1: Input: A e 



accuracy parameter e > 0. 



2: Let A = A and zero-out all entries of A that are smaller (in absolute value) than e/2n. 
3: Set s as in Eqn. JT]). 

4: For t = 1 . . . s (i.i.d. trials with replacement) randomly sample indices (it,jt) (entries of A), 
with 



'((h,3t) = (ij)) = Pij, where p l} := A-J A for all e [n] x [n 



5: Output: 



~ 1 s 



.4 



3 7=1 



Algorithm 1: Matrix Sparsification Algorithm 

Theorem 1 Let A € M ,ixn be any matrix, let e > be an accuracy parameter, and let A be the 
sparse sketch of A constructed via Algorithm 1. If 



_ 28nln(y / 2n) 2 
s — -5 II-^IIf ' 



(1) 



i/ien, wit/j probability at least 1 — n , 



< e. 



^4 /ias mosf s non-zero entries and the construction of A can be implemented in one pass over 
the input matrix A (see Section ] 3. 2} ). 

We conclude this section with Corollary [T] which is a re-statement of Theorem Q] involving the 
stable rank of A, denoted by sr (^4) (recall that the stable rank of any matrix A is defined as the 
ratio W (A) := / \\A\\t,, which is upper bounded by the rank of A). The corollary guarantees 

relative error approximations for matrices of - say - constant stable rank, such as the ones that 
arise in }Rec09| ICTTO] . 

Corollary 1 Let A 6 M nx " be any matrix and let e > be an accuracy parameter. Let A be the 
sparse sketch of A constructed via Algorithm 1 (with e = e||A|| 2y ). // s = 28risr (A) In (y/2n) /e 2 , 
then, with probability at least 1 — n , 



A- A 



<e\\A\\ 



It is worth noting that the sampling algorithm implied by Corollary [T] can not be implemented in 
one pass, since we would need a priori knowledge of the spectral norm of A in order to implement 
Step 2 of Algorithm 1. 



2 Related Work 

In this section (as well as in Table [1} , we present a head-to-head comparison of our result with 
all existing (to the best of our knowledge) bounds on matrix sparsification. In [AMOll AM07 
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the authors presented a sampling method that requires in expectation 16n \\A\\ F /e 2 + 8 4 nlog 4 n 
non-zero entries in A in order to achieve an accuracy guarantee e with a failure probability of at 
Compared with our result, their bound holds only when e > 4-y/n • maxjj \Aij\ 



most e 



-19 log 4 n 



and, in this range, our bounds are superior when || A\\ F /(maxjj |j4ij|) = o(nlog n). It is worth 
mentioning that the constant involved in jAMOll |A"M 07 is two orders of magnitude larger than 
ours and, more importantly, that the results of |AM01[ IAM07] hold only when n > 700 • 10 6 . 

In [GT09j . the authors study the || • ||oo->-2 an d || ■ || oo— >i norms in the matrix sparsification context 
and they also present a sampling scheme analogous to ours. They achieve (in expectation) a sparsity 



bound of Rn 



A|| F maxjj 



\Aij\/e 2 when e > \/nR ma,Xij here R = max y - / min^-^o \Ai 



Thus, our results are superior (in the above range of e) when R ■ max^j \ Aij\ — w(logn). 

It is harder to compare our method to the work of [AHK06], which depends on the Ylij=i l^il- 
The latter quantity is, in general, upper bounded only by n ||A|| F , in which case the sampling com- 
plexity of [AHK06] is much worse, namely 0(n 3//2 \\A\\ F /e). Finally, the recent bounds on matrix 
sparsification via the non-commutative Khintchine's inequality in NDT09] are inferior compared 
to ours in terms of sparsity guarantees by at least 0(ln 2 (n/ In 2 n)). However, we should mention 
that the bounds of [NDT09] can be extended to multi-dimensional matrices (tensors), whereas our 
result does not generalize to this setting; sec [NDT10 for details. 



Comparison with Prior Results 



Sparsity of A 



Failure 
Probability 



Citation 



Comments 



16n]|A||^/e 2 +8 4 nlog 4 n 



Expected 



-19 log* n 



[AM07] 



e > As/ri ■ b 
n > 700 • 10 6 



R-b-n\\A\\l/e 2 



Expected 



-il(R-n) 



GT09; 



> c\ \Jn ■ R ■ b, n > 1 



C2n\og 2 (j^) log n\\Af F /e 2 



Expected 



1/r. 



[NDT09] 



e > 0, n > 300, 
c 2 < 45 2 



c 3 n log 3 n||A|| 2 /e 2 



Expected 



1/n 



|NDT10] 



e > 0, n > 300 
Extends to tensors 



C4 V / ^E„ \Aij\/e 



Exact 



-f2(n) 



AHK06 



e > 0, n > 1 



28nln (ygn) \\A\\l /e 2 



Exact 



1/r 



Theorem [T] 



e > 0, n > 1 



Table 1: Summary of prior work in matrix sparsification results. Given a matrix A € R' iX ™ and an 



accuracy parameter e > 0, we seek a sparse A £ M. nxn such that 



A- A 



< e. The first column 



indicates the number of non-zero entries in A, whereas the second column indicates whether this 
number is exact or simply holds in expectation. In terms of notation, we let b denote the max^j \Aij\ 
and R denote the maxy \Aij |/ min^^^o \ Finally, Ci, 02,03,04 denote unspecified constants. 
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3 Background 



3.1 Notation 

We let [n] denote the set {1,2, ... ,n}. We will use the notation P (•) to denote the probability of the 
event in the parentheses and E (X) to denote the expectation of a random variable X. When X is 
a matrix, E (X) denotes the element-wise expectation of each entry of X. For a matrix X £ K" xn , 
iw' will denote the j-th column of X as a column vector and, similarly, Xu\ will denote the 
i-th row of X as a row vector (for any i or j in [n]). The Frobenius norm ||A|| F of the matrix 
X is defined as ||X|| F = J2i j=i Xfj, and the spectral norm ||X|| 2 of the matrix X is defined as 
||A|| 2 = max||j,|| 2=1 ||Xy|| 2 . For two symmetric matrices X, Y we say that Y >z X if and only if 
Y — X is a positive semi-definite matrix. Finally, I„ denotes the identity matrix of size n and In x 
denotes the natural logarithm of x. 



3.2 Implementing the Sampling in one Pass over the Input Matrix 

We now discuss the implementation of Algorithm 1 in one pass over the input matrix A. Towards 
that end, we will leverage (a slightly modified version of) Algorithm Select (p. 137 of DKM06 ). 
We note that Step 3 essentially operates on A. Clearly, in a single pass over the data we can run 



1: Input: Aij for all £ [n] x [n], arbitrarily ordered and e > 0. 



N = 0. 

For all e [n] x [n] such that A?- > 



»J An 1 



• N = N + A? 



• Set (/, J) = and S = A^ with probability 

4: Output: Return (I, J), S and N. 



Algorithm 2: One-pass Select algorithm 

in parallel s copies of the Select Algorithm (using a total of O(s) memory) to effectively return s 
independent samples from A. Lemma 1 (page 136 of [DKM06 , note that the sequence of the Afj's 
is all-positive) guarantees that each of the s copies of Select returns a sample satisfying: 



A' 2 
A *3 



4 2 



V" A 2 



A 



for alH = 1, 



. .,s. 



Finally, in the parlance of Step 5 of Algorithm 1, (it,jt) is set to (I, J) and Pi t j t is set to S 2 /N for 
all te [s}. 



4 Proof of Theorem [T] 

The proof of Theorem [1] will combine Lemmas [T] and |4] in order to bound 



A- A 



as follows: 



A- A 



A-A+A-A 



< 



A- A 



A- A 



< e/2 + e/2 = e. 
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The failure probability of Theorem [T] emerges from Lemma 21 which fails with probability at most 
nr 1 for the choice of s in Eqn. (fTJ). The proof of Lemma [4] will involve an elegant matrix- valued 
Bernstein bound proven in |Rec09j . See also |Gro09j or |TrolO| Theorem 2.10] for similar bounds. 



4.1 Bounding 



A - A 



Lemma 1 Using the notation of Algorithm 1, 



A- A 



< e/2. 



Proof: Recall that the entries of A are either equal to the corresponding entries of A or they are 
set to zero if the corresponding entry of A is (in absolute value) smaller than e/(2n). Thus, 



A- A 



< 



A- A 



n 2 n 2 2 



i,j'=l 



4.2 Bounding 



A - A 



In order to prove our main result in this section (Lemma we will leverage a powerful matrix- 
valued Bernstein bound originally proven in |Rec09] (Theorem 3.2). We restate this theorem, 
slightly rephrased to better suit our notation. 



Theorem 2 [Theorem 3.2 OF [Rec09J] Let M 1 ,M 2 , ■ ■ ■ ,M s be independent, zero-mean random 
matrices in R nxn . Suppose max te[s] {||E (M t M t T ) || 2 , ||E (M? M t )\\ 2 } < p 2 and \\M t \\ 2 < 7 for all 
t £ [s]. Then, for any t > 0, 



1 



< r 



holds, subject to a failure probability of at most 



2n exp I — 



ST 2 /2 

p 2 + 7t/3 



In order to apply the above theorem, using the notation of Algorithm 1, we set M t = „'"' e^ej — A 



for all t £ [s] to obtain 



s s 



A, 



Pith 



A 



A- A. 



(2) 



Let n denote the all-zeros matrix of size n. It is easy to argue that E(M t ) = 0„ for all t £ [s]. 
Indeed, if we consider that Y^7j=i Pij = 1 an< ^ ^ = ^27 4=1 ^-ij e i e J we obtain 



Pij I ^i^'i 



E(M t ) = ' ' 



Pij 



A = ^ AijCiej - ^2 PijA = °r, 



Our next lemma bounds ||Af t || 2 for all t £ [s]. 
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Lemma 2 Using our notation, ||M t || 2 < 4ne 1 A for all t G [s] 



Proof: First, using the definition of M t and the fact that Vi t jt — -^f t j t / 



A 



\M; 



t\\ 2 



A 



e H e jt a 



Pirn 



< 





2 


A 






F , 







.4 



< 



2n 


A 


2 






F 



A 



The last inequality follows since all entries of A are at least e/(2n) and the fact that 
We can now assume that 



.4 



< 



.4 



.4 



2n 



< 



, , < 3 > 

to conclude the proof of the lemma. To justify our assumption in Eqn. ([3]), we note that if it is 

violated, then it must be the case that A < e/(2n). If that were true, then all entries of A 

f ^ 

would be equal to zero. (Recall that all entries of A are either zero or, in absolute value, larger 
than e/ (In).) Also, if A were identically zero, then (i) A would also be identically zero and, (ii) all 
entries of A would be at most e/(2n). Thus, 



A- A 



2 = Uh<\\A\\ F < ] /n^= e -. 



Thus, if the assumption of Eqn. ^ is not satisfied, the resulting all-zeros A still satisfies Theorem[TJ 



Our next step towards applying Theorem [2] involves bounding the spectral norm of the expectation 
of M t Mf . The spectral norm of the expectation of MfM t admits a similar analysis and the same 
bound and is omitted. 



Lemma 3 Using our notation, ||E (M t M t T ) || 2 < 



.4 



for any t £ [s] 



Proof: We start by evaluating E (M t M t T ); recall that Pij = A 2 -/ 



A 



E (M t Mf) = E ( ( ^e lt el - A \ ( ^e u el - A T 



Pith 



A; 



= E & ^ 



Pnh 

( A 



e,eT-A \ I '—e.ej' - A 



■j=i 



Pij 



Pij 



n (A 2 ^ ^ ^ ^ ^ 

= ^ I —6i€i Aij AejC^ AijCiCj A -\- pijAA 



i \ Pij 
1,3 = 1 \ J 

2 
F 



n / n 



a\\ v J2 m i ■ ^ - E Ie J E Av e I - E E 1 
j=i i=i j=i \i=i 



i=i 



>j' i ! [ A- e i ) 'y y PijAA T , 



G 



where TOj is the number of non-zeroes of the z-th row of A. We now simplify the above result using a 
few simple observations: E^=iPw = h Ae 3 = A^\ £™ =1 Aijet = A®, and £" =1 ^ = 
AA T . Thus, we get 



E (M t M t T ) 



2|| 2 mi ■ e t ef - £ A« T - £ A« T + AA 1 



i=i 



-A mi ■ eiej - XX 



T7 , 



Since < m, < n and using Weyl's inequality (Theorem 4.3.1 of [HJ90] ). which states that by 
adding a positive semi-definite matrix to a symmetric matrix all its eigenvalues will increase, we 
get that 



-AA T <~E(M t Mj) <n 



A 



Consequently ||E(M t M t T ) || 2 = max- 







2 








max / 


A 


,n 

2 


A 


> = n 
F J 


A 



2 

f' 

O 
^ 2 

We can now apply Theorem [2] on Eqn. @ with r = e/2, 7 = 4ne~ 1 A (Lemma 0, and 

F 

2 



p = n 



A 



(LemmaH) . Thus, we get that 



of at most 



A - A 



< e/2 holds, subject to a failure probability 



2nexp 



V 



(1 + 4/6)n 



Bounding the failure probability by (5 and solving for s, we get that 



14 

s > —^n 



A 



, 2n 
ln 1 < 

F \ 



Using 



.4 



< \\A\\ F (by construction) concludes the proof of the following lemma, which is the 



main result of this section. 

Lemma 4 I 

at least 1 — 5 



Lemma 4 Using the notation of Algorithm 1, if s > 14ne 2 ||A|| F ln (2n/8) , then, with probability 



A- A 



< e/2. 
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